training script
Visualizing attention zones in machine reading comprehension models
Cui, Yiming, Zhang, Wei-Nan, Liu, Ting
The attention mechanism plays an important role in the machine reading comprehension (MRC) model. Here, we describe a pipeline for building an MRC model with a pretrained language model and visualizing the effect of each attention zone in different layers, which can indicate the explainability of the model. With the presented protocol and accompanying code, researchers can easily visualize the relevance of each attention zone in the MRC model. This approach can be generalized to other pretrained language models. For complete details on the use and execution of this protocol, please refer to Cui et al. (2022).
- Asia > China > Heilongjiang Province > Harbin (0.04)
- North America > United States (0.04)
- Asia > China > Beijing > Beijing (0.04)
Circumventing Concept Erasure Methods For Text-to-Image Generative Models
Pham, Minh, Marshall, Kelly O., Cohen, Niv, Mittal, Govind, Hegde, Chinmay
Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likenesses of celebrities. Consequently, various methods have been proposed in order to "erase" sensitive concepts from text-to-image models. In this work, we examine five recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we leverage the existence of special learned word embeddings that can retrieve "erased" concepts from the sanitized models with no alterations to their weights. Our results highlight the brittleness of post hoc concept erasure methods, and call into question their use in the algorithmic toolkit for AI safety.
- North America > United States > New York (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
A Case Study on AI Engineering Practices: Developing an Autonomous Stock Trading System
Today, many systems use artificial intelligence (AI) to solve complex problems. While this often increases system effectiveness, developing a production-ready AI-based system is a difficult task. Thus, solid AI engineering practices are required to ensure the quality of the resulting system and to improve the development process. While several practices have already been proposed for the development of AI-based systems, detailed practical experiences of applying these practices are rare. In this paper, we aim to address this gap by collecting such experiences during a case study, namely the development of an autonomous stock trading system that uses machine learning functionality to invest in stocks. We selected 10 AI engineering practices from the literature and systematically applied them during development, with the goal to collect evidence about their applicability and effectiveness. Using structured field notes, we documented our experiences. Furthermore, we also used field notes to document challenges that occurred during the development, and the solutions we applied to overcome them. Afterwards, we analyzed the collected field notes, and evaluated how each practice improved the development. Lastly, we compared our evidence with existing literature. Most applied practices improved our system, albeit to varying extent, and we were able to overcome all major challenges. The qualitative results provide detailed accounts about 10 AI engineering practices, as well as challenges and solutions associated with such a project. Our experiences therefore enrich the emerging body of evidence in this field, which may be especially helpful for practitioner teams new to AI engineering.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
- Europe > Ukraine (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Data Science > Data Quality (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Train ML models - Azure Machine Learning
Azure Machine Learning provides multiple ways to submit ML training jobs. In this article, you'll learn how to submit jobs using the following methods: SDK v2 is currently in public preview. The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Data Parallelism and Distributed Deep Learning at production scale (part 2)
Lastly, our optimiser is wrapped by Horovod's implementation for distributed optimisation (which handles the all-gather and all-reduce MPI operations). We next assign training callbacks to GPU processors based on the processor's (unique) global rank. By default, rank-0 is designated as the root node. There are some operations we only need executing on a single node (for example, using a model checkpoint to save model weights to file). Each processor will effectively run their own training job which optionally prints training accuracy, loss, and custom metrics to CloudWatch.
- Energy (0.69)
- Information Technology (0.46)
Large-Scale Distributed Training with TorchX and Ray
Ray, created at RISELab by the founders of Anyscale. It provides a rich set of native libraries for ML workloads and a general-purpose core for building distributed applications. On top of the libraries provided by Ray, there is a rich ecosystem of libraries and integrations that enable PyTorch users to achieve greater scale. Two great examples are PyTorch Distributed and PyTorch Lightning enabling users to take advantage of the amazing PyTorch and Ray capabilities together. This blog introduces how TorchX extends functionality to submit PyTorch jobs via a newly developed Ray Scheduler.
Getting Started with PyTorch Image Models (timm): a practitioner's guide
PyTorch Image Models (timm) is a library for state-of-the-art image classification, containing a collection of image models, optimizers, schedulers, augmentations and much more; it was recently named the top trending library on papers-with-code of 2021! Whilst there are an increasing number of low and no code solutions which make it easy to get started with applying Deep Learning to computer vision problems, in my current role as part of Microsoft CSE, we frequently engage with customers who wish to pursue custom solutions tailored to their specific problem; utilizing the latest and greatest innovations to exceed the performance level offered by these services. Due to the rate that new architectures and training techniques are introduced into this rapidly moving field, whether you are a beginner or an expert, it can be difficult to keep up with the latest practices and make it challenging to know where to start when approaching new vision tasks with the intention of reproducing similar results to those presented in academic benchmarks. Whether I'm training from scratch or finetuning existing models to new tasks, and looking to leverage pre-existing components to speed up my workflow, timm is one of my favourite libraries for computer vision in PyTorch. However, whilst timm contains reference training and validation scripts for reproducing ImageNet training results and has documentation covering the core components in the official documentation and the timmdocs project, due to the sheer number of features that the library provides it can be difficult to know where to get started when applying these in custom use-cases. The purpose of this guide is to explore timm from a practitioner's point of view, focusing on how to use some of the features and components included in timm in custom training scripts. The focus is not to explore how or why these concepts work, or how they are implemented in timm; for this, links to the original papers will be provided where appropriate, and I would recommend timmdocs to learn more about timm's internals. Additionally, this article is by no means exhaustive, the areas selected are based upon my personal experience using this library. All information here is based on timm 0.5.4 which was recently released at the time of writing. Whilst this article can be read in order, it may also be useful as a reference for a particular part of the library. For ease of navigation, a table of contents is presented below. Tl;dr: If you just want to see some working code that you can use directly, all of the code required to replicate this post is available as a GitHub gist here. One of the most popular features of timm is its large, and ever-growing collection of model architectures.
Choose the best data source for your Amazon SageMaker training job
Amazon SageMaker is a managed service that makes it easy to build, train, and deploy machine learning (ML) models. Data scientists use SageMaker training jobs to easily train ML models; you don’t have to worry about managing compute resources, and you pay only for the actual training time. Data ingestion is an integral part of […]
Getting Started with PyTorch Image Models (timm): a practitioner's guide
PyTorch Image Models (timm) is a library for state-of-the-art image classification, containing a collection of image models, optimizers, schedulers, augmentations and much more; it was recently named the top trending library on papers-with-code of 2021! Whilst there are an increasing number of low and no code solutions which make it easy to get started with applying Deep Learning to computer vision problems, in my current role as part of Microsoft CSE, we frequently engage with customers who wish to pursue custom solutions tailored to their specific problem; utilizing the latest and greatest innovations to exceed the performance level offered by these services. Due to the rate that new architectures and training techniques that are introduced into this rapidly moving field, whether you are a beginner or an expert, it can be difficult to keep up with the latest practices and make it challenging to know where to start when approaching new vision tasks with the intention of reproducing similar results to those presented in academic benchmarks. Whether I'm training from scratch or finetuning existing models to new tasks, and looking to leverage pre-existing components to speed up my workflow, timm is one of my favourite libraries for computer vision in PyTorch. However, whilst timm contains reference training and validation scripts for reproducing ImageNet training results and has documentation covering the core components in the official documentation and the timmdocs project, due to the sheer number of features that the library provides it can be difficult to know where to get started when applying these in custom use-cases. The purpose of this guide is to explore timm from a practitioner's point of view, focusing on how to use some of the features and components included in timm in custom training scripts. The focus is not to explore how or why these concepts work, or how they are implemented in timm; for this, links to the original papers will be provided where appropriate, and I would recommend timmdocs to learn more about timm's internals. Additionally, this article is by no means exhaustive, the areas selected are based upon my personal experience using this library. All information here is based on timm 0.5.4 which was recently released at the time of writing. Whilst this article can be read in order, it may also be useful as a reference for a particular part of the library. For ease of navigation, a table of contents is presented below. Tl;dr: If you just want to see some working code that you can use directly, all of the code required to replicate this post is available as a GitHub gist here. One of the most popular features of timm is its large, and ever-growing collection of model architectures.
Introducing Distributed Data Parallel support on PyTorch Windows - Microsoft Open Source Blog
Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can't really do anything about it. If you have the luxury (especially at this moment of time) of having multiple GPUs, you are likely to find Distributed Data Parallel (DDP) helpful in terms of model training. DDP performs model training across multiple GPUs, in a transparent fashion. You can have multiple GPUs on a single machine, or multiple machines separately.